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Abstract Body 


Background / Context: 

Language minority students, defined as those who speak a non-English language at home, 
are a fast- growing subpopulation constituting 21% of the school-age children in the U.S. in 2009 
(NCES, 2012). About 75% of these students speak Spanish and are immigrants from Central or 
South America (NCES, 2009). Many language minority students speak English with difficulty 
and therefore are lagging behind in their academic performance. The federal law holds districts 
and schools accountable for students’ progress toward English language proficiency and 
achievement of academic standards. The government allocates funds to support specialized 
English language learning (EEE) services. An English language leaner is a student whose 
primary language is not English and whose level of proficiency in English is not sufficient to 
support learning in a regular English language classroom. Such students are entitled to support in 
the classroom until they achieve the level of English proficiency needed for full participation. 

The term “EEE services” encompasses English-as-a-second-language (ESE) programs, bilingual 
education programs, and other types of specialized programs for EEE students. 

There have been intense debates about how many years of EEE services would be 
optimal for English language learners. Researchers have argued that, although it takes only 
several years of exposure to English for immigrant children to approach native-like levels in 
conversational skills, it takes four to nine years (Collier, 1987, 1989) or five to seven years 
(Cummins, 1981) to become proficient in academic English essential for learning in content 
areas (August & Hakuta, 1997). 

According to the nationally representative Early Childhood Eongitudinal Study- 
Kindergarten cohort (ECES-K) data, however, in practice there has been vast variation across 
schools in the average years of EEE services provision, with three years being the mode. States 
and districts vary in how they initially identify students as EEEs and how they determine 
whether a student is ready to exit from an EEE program (Zehler et al, 2003). Students who are 
identified as EEEs are assessed annually for continued eligibility for EEE services often on the 
basis of their demonstration of oral English proficiency, classroom performance, or grades. A 
student who might be considered not in need of EEE services initially or not in need of 
continuation of the services under one screening system might possibly be provided with EEE 
services under a different system. In the meantime, parents have the right to choose among 
instructional programs and to remove their child from an EEE program (Zehler et al, 2008). The 
conceivable randomness in EEE services provision to a given student may be attributed to 
teacher discretion, parental preference, shifting assessment criteria, measurement errors, or 
temporal changes in staffing and other local resources. 

Purpose / Objective / Research Question / Focus of Study: 

This study focuses on assessing the contribution of EEE services to Spanish-speaking 
students’ mathematics learning in elementary schools. EEE students tend to have lower average 
math achievement at school entry and throughout elementary school (Ery 2007; Reardon & 
Galindo 2009). Only 1 1 percent of EEE students scored at or above the proficient level on a 
national assessment of mathematics in comparison with the 36 percent proficient rate among all 
fourth graders (NCES, 2005). Spanish-speaking EEEs generally achieve lower academically yet 
demonstrate a faster learning rate than Asian and non-Hispanic white elementary students (Han, 
2008). Earlier research has also indicated that EEE services may have differential impacts on 
Spanish- speaking and non-Spanish-speaking language minority students (Hong, 2012) and that 
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the former group appears to be more vulnerable than the latter to the lack of school resources 
(Han, 2008). A counterfactual and seemingly politically incorrect question is: Would the 
Spanish- speaking ELLs be worse off if they were immersed in regular elementary classes instead 
without ELL services? The answer will supply important empirical evidence shedding light on 
the potential consequences of the highly inconsistent ELL program prevision across the country. 

Specifically, given that a considerable proportion of language minority children receive 
no ELL services either when they start kindergarten or beyond kindergarten, we first ask whether 
Spanish- speaking ELL students in kindergarten would be worse off in their math learning if they 
were deprived of ELL services in kindergarten. The next question is whether these same students 
would be benefit from a second year of ELL services in first grade in comparison with having 
one year of ELL services in kindergarten only. A third research question is perhaps of higher 
theoretical and policy reference: Do more than three years of ELL services benefit Spanish- 
speaking students’ math learning more than three or fewer years of services? 

Additionally, we consider heterogeneity within the Spanish- speaking student population. 
Some students are self-made early bilinguals who may acquire English language skills even 
without ELL services; some are program- made early bilinguals who may gamer benefit from 
ELL programs and become proficient in English within a relatively short time frame; while some 
other students are late bilinguals who may require sustained support of ELL services. We 
develop a strategy for identifying these subpopulations of students and investigating the optimal 
length of ELL services for each subpopulation. 

Setting: 

We draw data from the nationally representative ECLS-K data set. The students were 
followed from the beginning of kindergarten in 1998 through the end of fifth grade in 2004. 

Population / Participants / Subjects: 

We focus our interest on Spanish- speaking language minority students in US elementary 
schools. We have identified a sample of 2,205 Spanish-speaking language minority students 
entering kindergarten in 1998. Spanish- speaking students on average are perceived to have 
relatively low readiness for school (Crosnoe & Turley, 2011; Puller et al, 2009) and high risk for 
high school dropout (Driscoll, 1999). In addition to the language barrier and immigration status, 
many of these students are from poor households and have parents with relatively low education 
(Crosnoe, 2006: Hernandez, 1999). 

Intervention / Program / Practice: 

Lor most Spanish-speaking students in the ECLS-K sample, we have identified their 
multi-year sequences of ELL services from kindergarten to fifth grade on the basis of teacher 
report supplemented by school administrator report on ELL program provision. Lor example, the 
teachers reported on whether every sampled student received in-class or pull-out ESL 
instruction, bilingual education, or Title I ESL or bilingual education, and whether the teacher 
used Spanish for instruction (see the technical report by West (2013) for details). 

Research Design: 

We conduct secondary analysis of large-scale longitudinal survey data evaluating ELL 
services as time-varying treatments. A major challenge to the evaluation of the causal effects of 
time- varying treatments was well-documented in the epidemiology literature over 100 years ago 
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(Arrighi & Hertz-Picciotto, 1994). In this context, students who display a higher level 
proficiency in English at kindergarten entry are less likely to be assigned to an ELL program; 
subsequently at the end of each school year, those who have gained more proficiency in English 
are more likely to exit the ELL program. Therefore, comparing the average level of academic 
outcome of the treated students with that of the untreated, or comparing the average outcome of 
those who have remained in an ELL program with those who have already exited, one would 
likely underestimate the potential benefit of ELL services. These dynamic selection processes 
result in the endogeneity problem in causal inference that cannot be easily addressed by most 
methods for statistical adjustment (Hong & Raudenbush, 2008). 

We develop and apply a non-parametric marginal mean weighting through stratification 
(MMWS) strategy to remove selection bias associated with baseline and time-varying covariates. 
Our goal is to approximate a sequential randomized experiment in which students are 
hypothetically assigned at random to either an ELL program or a control condition at the 
beginning of each school year. The causal validity of the results depends primarily on the 
richness of the observed covariates, which we will describe below. Our key identification 
assumption is that, given the past observed covariate history, treatment history, and outcome 
history, a student’s current treatment assignment is assumed independent of all the future 
potential outcomes such that the treatment assignment could be viewed as if randomized. When 
this assumption holds, the weighted mean observed outcome of a group experiencing a given 
treatment sequence consistently estimates the population mean outcome associated with that 
treatment sequence (Hong & Raudenbush, 2008; Robins, 1999). By employing a non-parametric 
procedure to estimate the weight, MMWS overcomes important limitations of the inverse- 
probability-of-treatment weighting (IPTW) method well-known to epidemiologists (Heman, 
Brumbeck, & Robins, 2000) and generates relatively more robust and efficient estimates despite 
possible misspecification of the statistical models for predicting the probability of treatment 
assignment (Hong, 2010). We conduct a similar weighted analysis within each subpopulation of 
students to approximate a sequential randomized block design. 

Data Collection and Analysis: 

One outcome measure is math direct assessment scores. The assessment was 
administered in English if a student was proficient in English and was administered in Spanish if 
the student was proficient in Spanish but not in English. A second outcome measure is teacher 
rating of student math achievement. Teachers were asked to consider a student’s math skills 
demonstrated in his or her native language if the student did not demonstrate skills in English. 
The direct assessment scores and the teacher ratings both have high reliability and have both 
been vertically scaled over multiple waves of observations. The correlation between the two 
outcome measures is about .5. This is partly because, unlike the math direct assessment, the 
teacher ratings measure not only the products but also the process of a student’s math learning 
and reflect a broader sampling of the most recent math curriculum standards and guidelines. We 
use Yf- to denote a student’s math outcome at time t for t = —1, 0, 1, 3, 5 representing the time at 
kindergarten entry, the spring of kindergarten, the spring of first grade, the spring of third grade, 
and the spring of fifth grade, respectively (there were no data collection in second grade and 
fourth grade). Here the value of t corresponds to the grade level. 

Individual and contextual characteristics measured at the beginning of kindergarten, 
denoted by X_^, include student demographic characteristics, family characteristics, preschool 
experience, baseline oral English proficiency, and baseline academic and social-emotional skills. 
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Baseline covariates also include kindergarten class composition, school composition, and teacher 
characteristics. Time-varying covariates measured in each subsequent wave, denoted by for 
t — 0,1, 3, 5, include the evolving status of student oral English proficiency, academic and 
social-emotional skills, class and school compositions, and teacher characteristics. 

Let Zf- denote the treatment received by a student in year t for t = 0, 1, 3, 5. Let — lif 
a student received ELL services in year t and 0 otherwise. Table B.l explains how we compute 
MMWS to adjust for bias due to selective treatment assignment at each time point. Ligure B.l 
provides more details on the sequential stratification of two years of data. 

We analyze a two-level model and apply MMWS at level 1 for student i at time t. Lor 
example, with the first two years of data, the level 1 model is specified as 


Yti = + PoiLoihi(t > 0) + P^iL^ihiit = 1) + = l)Iti(t = 0) + ’“^/i(Zo 


l)/i(Zi = 0) + s[^’^hi(Zo = 0)/i(Zi = 1) + df ’^^/i(Zo = l)/i(Zi = l)]/ti(t = 1) + 8ti, 
Sti~N(0, <T^). At level 2, we have that -I- u_n, Pqi =Yq+ Uqi, and = 7i + 

(l) 

The random coefficients u_ij, Uqi, and are assumed to be multivariate normal. Here 
estimates the effect of having ELL services in kindergarten on the kindergarten outcome; 

fni') Til') 

, and dj^ ’ estimate the respective effects on the first grade outcome of having ELL 
services in kindergarten only, in first grade only, and in both kindergarten and first grade. Table 
B.2 illustrates the level- 1 data structure with four hypothetical students. We analyze a model 
with an analogous structure for multiple years of data. 

Additionally, we consider three subpopulations of students empirically identified 
according to their potential trajectories of English language development if never treated or 
always treated in kindergarten and first grade. Table B.3 provides a summary. We combine 
MMWS with an extension of the Peters-Belson-Prognostic score method (Belson, 1956; Hansen, 
2008; Peters, 1941) to identify subpopulation membership. 


Findings / Results: 

Table B.4 summarizes the observed frequency of multi-year treatment sequences. We 
have identified at least four years of treatment information for 1,070 students. Among them, 335 
never received ELL services throughout the elementary school years while 440 received more 
than three years of services starting from kindergarten. Among the 1,689 students who have 
treatment information in kindergarten and first grade, about 50% of them received ELL services 
in both years while more than a third of them never received ELL services. More results later. . . 


Conclusions: 

The findings from this study will inform the theoretical discussion with regard to whether 
four or more years of ELL services on average are necessary to enable Spanish- speaking 
elementary students to become proficient in academic English essential for math learning. Yet a 
one-size-for-all recipe is practically naive and often wasteful. Identifying the optimal length of 
ELL services for subpopulations of students therefore has immediate implications for ELL 
resource allocation. Nonetheless, empirical results from quasi-experimental data such as ECLS- 
K may contain remaining selection bias. To replicate the findings and draw more conclusive 
decisions, experimental designs contrasting the subpopulation- specific optimal treatment 
sequences identified in this study with the dominant sequences in the current practice may be 
employed as a natural next step in research. 
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Appendix B. Tables and Figures 


Table B.l. MMWS Computation 


Time 

Propensity Score 

Stratification on 

MMWS 

0 

0 (Zo-o) ^ ^ Zo|5Ci,X_i) 

^(Zo=Zq) 

PriZ.^z,) 
pr (Zo = ZoI-5q^°^ = So) 

1 

g(z,=z,|Zo=zo) ^ ^ zdzo,T_i,To,X_i,Xo) 

^(Zp=Zi|Zo=Zo) 

y^izo.z-0 _ j 4 /-(zo) X 

^^rCzOzo) _ pr(Zi = zJZo = Zo) 

pr (Zi = Zi|Zo = = Si) 

3 

0(Z3=Z3|z-o.,) ^ ^ Z 3 |Zo.i,F_i.l,X_i.l) 

^(Z3=Z3|Zo~l) 

]^(Zo_3) = X X 

^.J,(Z 3 |Z 0 ,Z 1 ) _ pr(Z3 = Z 3 IZ 0 = Zq,Zi = Zi) 

pr (Z 3 = Z 3 IZ 0 = Zo,Zi = = S 3 ) 

5 

g( 5 si 0 3 ) _ py{Z^ — ZsjZg^S, F_i_ 3 ,X_i_ 3 ) 

^(Z5=Z5|Zo~3) 

]^(zo.s) = vp(zo) X X ^ j^^(zs|Zo,Zi,Z3)^ 

j|/-(zs|zo.zi.z 3 ) _ pr(Zs = Zsl-^o = Zq,Zi = Zi,Z 3 = Z 3 ) 

^ pr (Zs = Z 5 IZ 0 = Zo,Zi = Zi,Z 3 = = S 5 ) 


Note: Here denotes the observed treatment history (zq, z^) and Zq^3 denotes the observed treatment history (zq, Z3). 
Propensity score estimation and stratification are always conducted within a subsample of students who shared the same treatment 
history. Additionally, T_i~i and denote the outcome history (Y_^, Yq, Y-J and (T_i, Yq, Y-^, Y3), respectively; and and 

X_i.^3 denote the covariate history and (X_i,Xo,Xi,X3), respectively. 
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Table B.2 


Repeated Observations at Level 1 for a Two-Level Model for Evaluating Two-Year Treatment Sequences 


Student Yf- 

Wt ^oL(t > 0) 

Li/,(t = 1) 

7(Zo = l)/t(t = 0) 

/(Zo = i)L(t = 1) 

/(Zi = 1)/Tt = 1) 

/(Zo = l)/(Zi = 1)7, (t = 1) 

1 

22 

1 

0 

0 

0 

0 

0 

0 

1 

32 

vpO) 

8 

0 

0 

0 

0 

0 

1 

42 

I/p(o.o) 

8 

13 

0 

0 

0 

0 

2 

20 

1 

0 

0 

0 

0 

0 

0 

2 

30 

vpw 

8 

0 

1 

0 

0 

0 

2 

45 

I/p(i.o) 

8 

13 

0 

1 

0 

0 

3 

24 

1 

0 

0 

0 

0 

0 

0 

3 

28 

M/O) 

9 

0 

0 

0 

0 

0 

3 

48 

I/p(o.i) 

9 

12 

0 

0 

1 

0 

4 

18 

1 

0 

0 

0 

0 

0 

0 

4 

30 

VpW 

9 

0 

1 

0 

0 

0 

4 

40 


9 

12 

0 

1 

1 
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Table B.3 


Definition of Three Student Subpopulations 


Subpopulations Would Become Orally Proficient Under Treatment 


in English in Two Years Kindergarten 1"‘ Grade 


Self-Made Early Bilinguals 

Yes 

No 

No 

Program-Made Early Bilinguals 

Yes 

Yes 

Yes 

Eate Bilinguals 

No 

Yes 

Yes 
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Table B.4 


Frequency of Multi-Year Treatment Sequences 


Six Years (K-5) 

Four Years (K-3) 

Two Years (K-1) 

One Year (K) 

Treatment 

N 

Treatment 

N 

Treatment 

N 

Treatment 

N 

Sequences 


Sequences 


Sequences 


Sequences 


O-O-O-O 

324 

0-0-0 

335 

0-0 

609 

0 

941 

1 -0-0-0 

no 

1-0-0 

117 

1-0 

140 

1 

1097 

1- 1-0-0 

98 

1-1-0 

117 

1-1 

849 



l-l-l-O 

53 

1-1-1 

440 





l-l-l-l 

231 







0-1 -0-0 

24 

0-1-0 

30 

0-1 

91 



O-l-l-O 

3 

0-1-1 

31 





O-l-l-l 

16 







Total 

859 

Total 

1,070 

Total 

1,689 

Total 

2,038 
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Figure B.l 


Sequential Stratification of Two Years of Observations 


Stratify on 0^ 


(zo) 


Stratify on 0^ 


(Zi|Zo=0). 



c(zo) _ „ 
‘^0 ~ ■^0 

o 

II 

o 

N 

Zo = 1 



1 



2 



3 



4 



5 



6 






c(zi|zo=0) _ 
^1 — 

N 

II 

o 

Zi = 1 


N 

II 

o 

Zi = 1 

c(zi|zo = l) _ „ 
^1 — 



1 





1 


2 





2 

3 





3 

4 





4 

5 





5 

6 





6 


Stratify on 0^ 


(Zi|Zo = l) 


Note: 

'^(z ) 

We first stratify the year-0 data on the estimated propensity score 0g ° for the year-0 treatment and use Sq = 1, ... ,6 to denote 
the six strata. We then compute — T ^ student in stratum Sq whose year-0 treatment is Zq = 0. The weight is 

^ for a student in stratum Sn whose year-0 treatment is Zn = 1. 
pr(Zo=l|S^^°^=So) 

For students who had the same year-0 treatment Zq = 0, we stratify their year-1 observations on the estimated propensity score 
for the year-1 treatment and use = 5 ^ for = 1, ... ,6 to denote the six strata. The weight for the year-2 

observation of a student in stratum with treatment sequence (0,0) is x where 
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w; 


( 0 | 0 ) _ 


pr(Zi=0|Zo=0) 


pr(Zi=0|Zo=0 


_n c(^l|20=o)_ 


. The weight for a student in the same stratum with treatment sequence ( 0 , 1 ) is x 


r(l|0) 


( 1 | 0 ) 


5 l) 

pr(Zi=l|Zo=0) 


W, where W, - 

The year -1 observations of students who had the same year -0 treatment Zq = 1 are stratified on the estimated propensity score 
g(zi|zo-i) creates a different set of strata denoted by = 5^ for = 1 , ... ,6. The weight for the year -2 

observation of a student in stratum with treatment sequence (1,0) is x where 


— — j — pr(Zi ^ weight for the year-2 observation of a student in stratum with treatment sequence (1,1) is 

pr(Zi=0|Zo = l,S^ =Si) 


^(1.1) ^ ,4^(1) X where = 


pr(Zi=l|Zo=l) 


pr(Zi = l|Zo=l,sfi'^°-i =si)’ 
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